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Abstract — We consider a problem of significant practical im- 
portance, namely, the reconstruction of a low-rank data matrix 
from a small subset of its entries. This problem appears in many 
areas such as collaborative filtering, computer vision and wireless 
sensor networks. In this paper, we focus on the matrix completion 
problem in the case when the observed samples are corrupted 
by noise. We compare the performance of three state-of-the-art 
matrix completion algorithms (OptSpace, ADMiRA and FPCA) 
on a single simulation platform and present numerical results. 
We show that in practice these efficient algorithms can be used 
to reconstruct real data matrices, as well as randomly generated 
matrices, accurately. 

I. Introduction 

We consider the problem of reconstructing an to x n low 
rank matrix M from a small set of observed entries possibly 
corrupted by noise. This problem is of considerable practical 
interest and has many applications. One example is collabora- 
tive filtering, where users submit rankings for small subsets of, 
say, movies, and the goal is to infer the preference of unrated 
movies for a recommendation system LI J. It is believed that the 
movie-rating matrix is approximately low-rank, since only a 
few factors contribute to a user's preferences. Other examples 
of matrix completion include the problem of inferring 3- 
dimensional structure from motion 12 and triangulation from 
incomplete data of distances between wireless sensors O. 

A. Prior and related work 

On the theoretical side, most recent work focuses on algo- 
rithms for exactly recovering the unknown low-rank matrix 
and providing an upper bound on the number of observed 
entries that guarantee successful recovery with high proba- 
bility, when the observed set is drawn uniformly at random 
over all subsets of the same size. The main assumptions of 
this matrix completion problem with exact observations is that 
the matrix M to be recovered has rank r <^ m,n and that 
the observed entries are known exactly. Adopting techniques 
from compressed sensing, Candes and Recht introduced a 
convex relaxation to the NP-hard problem which is to find a 
minimum rank matrix matching the observed entries |4|. They 
introduced the concept of incoherence property and proved that 
for a matrix M of rank r which has the incoherence property, 
solving the convex relaxation correctly recovers the unknown 
matrix, with high probability, if the number of observed entries 
\E\ satisfies, \E\ > Crn^-^logn. 



Recently |31 improved the bound to \E\ > 
Crn max{ log n, r} with an extra condition that the matrix 
has bounded condition number, where the condition number 
of a matrix is defined as the ratio between the largest 
singular value and the smallest singular value of M. We 
introduced an efficient algorithm called OptSpace, based on 
spectral methods followed by a local manifold optimization. 
For a bounded rank r ~ 0(1), the performance bound of 
OptSpace is order optimal 15|. Candes and Tao proved a 
similar bound \E\ > Cnr{\ogn)^ with a stronger assumption 
on the original matrix M, known as the strong incoherence 
condition but without any assumption on the condition 
number of the matrix M [6|. For any value of r, it is only 
suboptimal by a poly-logarithmic factor 

While most theoretical work focus on proving bounds for 
the exact matrix completion problem, a more interesting and 
practical problem is when the matrix M is only approximately 
low rank or when the observation is corrupted by noise. The 
main focus of this matrix completion with noisy observations 
is to design an algorithm to find an m x n low-rank matrix 
M that best approximates the original matrix M and provide 
a bound on the root mean squared error (RMSE) given by, 

RMSE= ^^||Af-M||F . (1) 



Candes and Plan introduced a generalization of the convex 
relaxation from |4| to the noisy case, and provided a bound 
on the RMSE |7|. More recently, a bound on the RMSE 
achieved by the OptSpace algorithm with noisy observations 
was obtained in fgl. This bound is order optimal in a number 
of situations and improves over the analogous result in Q. 
Detailed comparison of these two results are provided in 
Section HUD] 

On the practical side, directly solving the convex relaxation 
introduced in |4| requires solving a Semidefinite Program 
(SDP), the complexity of which grows proportional to rfi. 
Recently, many authors have proposed efficient algorithms 
for solving the low-rank matrix completion problem. These 
include Accelerated Proximal Gradient (APG) algorithm |9|, 
Fixed Point Continuation with Approximate SVD (FPCA) 
lITOl . Atomic Decomposition for Minimum Rank Approxima- 
tion (ADMiRA) [llj, Soft-Impute |12|, Subspace Evolu- 
tion and Transfer (SET) HI, Singular Value Projection (SVP) 
im, and OptSpace f5\. In this paper, we provide numeri- 



cal comparisons of the performance of three state-of-the-art 
algorithms, namely, OptSpace, ADMiRA and FPCA, and 
show that these efficient algorithms can be used to reconstruct 
real data matrices, as well as randomly generated matrices, 
accurately. 

B. Outline 

The organization of this paper is as follows. In Section 
2, we describe the matrix completion problem and efficient 
algorithms to solve the matrix completion problem when the 
observations are corrupted by noise. Section 3 discusses the re- 
sults of numerical simulations and compares the performance 
of three matrix completion algorithms with respect to speed 
and accuracy. 

II. The model definition and algorithms 

A. Model definition 

The matrix M has dimensions m x n, and we define 
a = m/n to denote the ratio. In the following we assume, 
without loss of generality, a > 1. We assume that the matrix 
M has exact low rank r <^ n, that is, there exist matrices U 
of dimensions m x r, V of dimensions n x r, and a diagonal 
matrix S of dimensions r x r, such that 

M = UY.V^ . 

Notice that for a given matrix M, the factors {U, V, E) are not 
unique. Further, each entry of M is perturbed, thus producing 
an 'approximately' low-rank matrix N, with 



M,, 



where the matrix Z accounts for the noise. 

Out of the m X n entries of N, a subset E C [rn] x [n] is 
observed. Let be the rax n observed matrix with all the 
observed values, such that 



iV„- if{i,j)£E, 
otherwise. 



Our goal is to find a low rank estimation M{N^,E) of the 
original matrix M from the observed noisy matrix and 
the set of observed indices E. 

B. Algorithms 

In the case when there is no noise, that is Nij = Mij, 
solving the following optimization problem will recover the 
original matrix correctly, if the number of observed entries 
is large enough. 



minimize rank(X) 

subject to Ve{X) = Ve{M) , 



(2) 



where X £ is the variable matrix, rank(X) is the rank 

of matrix X, and Ve{ ) is the projector operator defined as 



Ve{M), 



My if{i,j)eE, 
otherwise. 



(3) 



This problem finds the matrix with the minimum rank that 
matches all the observations. Notice that the solution of 



problem (|2]l is optimal. If this problem does not recover the 
correct matrix M then there exists at least one other rank-r 
matrix that matches all the observations and no other algorithm 
can distinguish which one is the correct solution. However, this 
optimization problem is NP-hard and all known algorithms 
require doubly exponential time in n [4|. 

In compressed sensing, minimizing the li norm of a vector 
is the tightest convex relaxation of minimizing the /q norm, 
or equivalently minimizing the number of non-zero entries, 
for sparse signal recovery. We can adopt this idea to matrix 
completion, where rank(-) of a matrix corresponds to norm 
of a vector, and nuclear norm to li norm |4 |, where the nuclear 
norm of a matrix is defined as the sum of its singular values. 



minimize ||^||* 

subject to Ve{X) = Ve{M) , 



(4) 



where ||X||* denotes the nuclear norm of X. 

In this paper, we are interested in the more practical case 
when the observations are contaminated by noise or the 
original matrix to be reconstructed is only approximately low 
rank. In this case, the constraint Ve{X) = Pb(M) must be 
relaxed. This results in either the problem Q, ifTOl . (9], lfT2l 



minimize ||^||* 

subject to \\Ve{X)-Ve{.N)\\f <Q 



or its Lagrangian version 
minimize /iH-^^H 



-\\Ve{X)-Ve{N)\\ 



(5) 



(6) 



In the following, we briefly explain the objective of the three 
state-of-the-art matrix completion algorithms basaed on the 
relaxation, namely, FPCA, ADMiRA, and OptSpace. 

FPCA, introduced in |10|, is an efficient algorithms for 
solving the convex relaxation, which is a nuclear norm regu- 
larized least squares problem in Following the same line 
of argument given in |7|, we choose ^ = where 
■p = \E\/mn and cr^ is the variance of each entry in Z. 

ADMiRA, introduced in lITTI . is an efficient algorithm 
which is based on the atomic decomposition and extends 
the idea of the Compressive Sampling Matching Pursuit 
(CoSaMP) |[T5l. ADMiRA is an iterative method for solving 
the following rank-r matrix approximation problem. 



minimize WVEiX) - Ve{N)\\f 
subject to rank(X) < r . 



(7) 



One drawback of ADMiRA is that it requires the prior 
knowledge of the rank of the original matrix M. In the 
following numerical simulations, for fair comparison, we first 
run a rank estimation algorithm to guess the rank of the 
original matrix and use the estimated rank in ADMiRA. The 
rank estimation algorithm is explained in the next section. 

OptSpace, introduced in [51, is a novel and efficient 
algorithm based on the spectral method followed by a local 
optimization, which consists of the following three steps. 
1. Trim the matrix N^. 



2. Compute the rank-r projection of the trimmed observation 
matrix. 

3. Minimize \\Ve{XSY^) - Ve{N)\\1, through gradient 
descent, using the rank-r projection as the initial guess. 

In the trimming step, we set to zero all columns in with 
the number of samples larger than 2\E\/n and set to zero all 
rows with the number of samples larger than 2\E\/m. In the 
second step, the rank-r projection of a matrix A is defined as 



T 



(8) 



where the SVD of A is given hy A — X]r=i '^i^ivl ■ Th^ 
basic idea is that the rank-r projection of the trimmed obser- 
vation matrix provides an excellent initial guess, so that the 
standard gradient descent provides a good estimate after this 
initialization. Note that we need to estimate the target rank r. 
To estimate the target rank r for ADMiRA and OptSpace, 
we used the following simple rank estimation procedure. 

C. Rank estimation algorithm 

Let be the trimmed version of . By singular value 
decomposition of the trimmed matrix, we have 



min(m,n) 

1=1 



where Xi and yi are the left and right singular vectors 
corresponding to ith singular value cj,. Then, the following 
cost function is defined in terms of the singular values. 



7i+l + (Tl 



\E\ 



m = 



Finally, the estimated rank is the index i that minimizes the 
cost function R{i). 

The idea behind this algorithm is that if enough entries of 
N are revealed and there is little noise then there is a clear 
separation between the first r singular values, which reveal the 
structure of the matrix M to be reconstructed, and the spurious 
ones 1^. Hence, ai+i/ai is minimum when i is the correct 
rank r. The second term is added to ensure the robustness of 
the algorithm. 

D. Comparison of the performance guarantees 

Performance guarantees for matrix completion problem with 
noisy observations are proved in [7| and [8|. Theorem 7 of 
[171 shows the following bound on the performance of solving 
convex relaxation (|5]l under some constraints on the matrix M 
known as the strong incoherence property. 

nn~ 2 
RMSE < 1 \\Ve{Z)\\f + -—\\Ve{Z)\\f , (9) 
V 1^1 n^a 

where RMSE is defined in Eq. ([T]|. The constant in front of 
the first term is in fact slightly smaller than 7 in |7|, but in 
any case larger than 4\/2. 

Theorem 1.2 of shows the following bound on the 
performance of OptSpace under the assumptions that M 



is incoherent and has a bounded condition number k = 
(Ji{M)/ar{M), where the condition number of a matrix is 
defined as the ratio between the largest singular value ai{M) 
and the smallest singular value ar{M) of M. 



RMSE < Ck^ 



ar—\\rEiZ)\\2 



(10) 



for some numerical constant C. 

Although the assumptions on the above two theorems are 
not directly comparable, as far as the error bounds are con- 
cerned, the bound (fTOl i improves over the bound (|9]) in several 
respects: The bound ( fTOl i does not have the second term in 
the bound ^ which actually grows with the number of ob- 
served entries; The bound ( fTol i decreases as rather than 
{n/\E\y^'^; The bound ( fTOl l is proportional to the operator 
norm of the noise matrix ||7^£;(^)||2 instead of the Frobenius 
norm ||7'£(Z)||i? > ||7'£;(Z)||2. For uniformlyrandom, one 
expects ||7^£(Z)||i? to be roughly of order ||7'£;(Z)||2v^- For 
instance, if the entries of Z are i.i.d. Gaussian with bounded 
variance cr, \\VEiZ)\\F = ^iVW\) while \\rE{Z)\\2 is of 



order ^\E\/n. 

In the following, we numerically compare the performances 
of three efficient algorithms, OptSpace, ADMiRA and 
FPCA, for solving the matrix completion problem, with real 
data matrices as well as randomly generated matrices. 

III. Numerical results 

In this section, we present numerical comparisons between 
three approximate low-rank matrix completion algorithms 
: OptSpace, ADMiRA and FPCA. The performance of 
each algorithm is compared in terms of the relative root 
mean squared error defined as in Eq. ([T]i- for randomly 
generated matrices in Section IIII-AI and real data matri- 
ces in Section IIII-BI We used MATLAB implementations 
of the algorithms and tested them on a 3.0 GHz Desk- 
top computer with 2 GB RAM. FPCA is available from 
www.columbia.edu/--sm2756/FPCA.htm and OptSpace is 
available from www. Stanford. edu/^^raghuram/optspace/ . 

A. Numerical results with randomly generated matrices 

For numerical simulations with randomly generated matri- 
ces, we use n X n test matrices M of rank r generated as 
M — UV^, where U and are n x r matrices with each 
entry being sampled independently from a standard Gaussian 
distribution JV{0, 1), unless specified otherwise. Each entry 
is revealed independently with probability e/n, so that on an 
average ne entries are revealed. The observation is corrupted 
by added noise matrix Z, so that the observation for the index 
(i, j) is Mij + Zij. 

In the standard scenario, we typically make the following 
three assumptions on the noise matrix Z. (1) The noise Zij 
does not depend on the value of the matrix Mij. (2) The 
entries of Z, {Zij}, are independent. (3) The distribution of 
each entries of Z is Gaussian. The above matrix completion 
algorithms are expected to be especially effective under this 
standard scenario for the following two reasons. First, the 
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Fig. 1. The RMSE (above) and the computation time in seconds 
(below) as a function of the average number of observed entries per 
row e for SNR=4 under the standard scenario. 



squared error objective function that the algorithms minimize 
is well suited for the Gaussian noise. Second, the independence 
of Zij's ensure that the noise matrix is almost full rank and 
the singular values are evenly distributed. This implies that for 
a given noise power ||^||f, the spectral norm \ \Z\\2 is much 
smaller than ||^||f- In the following, we fix m = n = 500 
and r = 4, and study how the performance changes with 
different noise. Each of the simulation results is averaged 
over 10 instances and is shown with respect to two basic 
parameters, the average number of revealed entries per row 
e and the signal-to-noise ratio, SNR = E[| ||,]/E[| |Z| ||,]. 

1 ) Standard scenario: In this standard scenario, the noise 
Zij's are distributed as i.i.d. Gaussian N(0,(t^). Note that the 
SNR is equal to A/a^. There is a basic trade-off between two 
metrics of interest: the accuracy of the estimation is measured 
using RMSE and the computation complexity is measured by 
the running time in seconds. 

In order to interpret the simulation results, they are com- 
pared to the RMSE achieved by the oracle and a simple rank-r 
projection algorithm defined as Eq. (|8]l. The rank-r projection 
algorithm simply computes Vr{N^). The oracle has prior 
knowledge of the linear subspace spanned by {UX'^ + YV^ : 
X e M™""^, Y e M"><''}, and the RMSE of the oracle estimate 



is CT-y/ (2nr — r'^)/ne Q. 

Figure [T] shows the performance and the computation time 
for each of the algorithms with respect to e under the standard 
scenario for fixed SNR= 4. For most values of e, the simple 
rank-r projection has the worst performance. However, when 
all the entries are revealed and the noise is i.i.d. Gaussian, the 
rank-r projection coincides with the oracle bound, which in 
this simulation corresponds to the value e = 500. Note that the 
behavior of the performance curves of FPCA, ADMiRA, and 
OptSpace with respect to e is similar to the oracle bound, 
which is proportional to l/\/e. 

Among the three algorithms, FPCA has the largest RMSE, 
and OptSpace is very close to the oracle bound for all values 
of e. Note that when all the values are revealed, ADMiRA is 
an efficient way of implementing rank-r projection, and the 
performances are expected to be similar. This is confirmed 
by the observation that for e > 400 the two curves are 
almost identical. One of the reasons why the RMSE of 
FPCA does not decrease with e for large values of e is that 
FPCA overestimates the rank and returns estimated matrices 
with rank much higher than r, whereas the rank estimation 
algorithm used for ADMiRA and OptSpace always returned 
the correct rank r for e > 80. 

The second figure in Figure [T] shows the average running 
time of the algorithms with respect to e. Note that due to the 
large difference between the running time of three algorithms, 
the time is displayed in log scale. For most of the simulations, 
ADMiRA had shortest running time and FPCA the longest, 
and the gap was noticeably large as clearly shown in the figure. 
For FPCA and OptSpace, the computation time increased 
with e, whereas ADMiRA had relatively stable computation 
time independent of e. 

Figure |2] show the performance and computation time for 
each of the algorithms against the SNR within the standard 
scenario for fixed e = 40. The behavior of the performance 
curves of ADMiRA and OptSpace are similar to the oracle 
bound which is linear in a which, in the standard scenario, is 
equal to 2/\/SNR. The performance of the rank-r projection 
algorithm is determined by two factors. One is the added noise 
which is linear in a and the other is the error caused by the 
erased entries which is constant independent of SNR. These 
two factors add up, whence the performance curve of the 
rank-r projection follows. The reason the RMSE of FPCA 
does not decrease with SNR for values of SNR less than 1 
is not that the estimates are good but rather the estimated 
entries gets very small and the resulting RMSE is close to 
y^E[\\M\\'j,/n% which is 2 in this simulation, regardless of 
the noise power. When there is no noise, which corresponds to 
the value 1/SNR = 0, FPCA and OptSpace both recover the 
original matrix correctly for this chosen value of e = 40. For 
all three algorithms, the computation time is larger for smaller 
noise, and the reason is that it takes more iterations until the 
stopping criterion is met. Also, for most of the simulations 
with different SNR, ADMiRA had shortest running time and 
FPCA the longest. 
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Fig. 2. The RMSE (above) and the computation time in seconds 
(below) as a function of 1/SNR for fixed e — 40 under the standard 
scenario. 
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Fig. 3. The RMSE as a function of the average number of observed 
entries per row e for fixed SNR= 4 within the multiplicative noise 
model. 
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Fig. 4. The RMSE as a function of the average number of observed 
entries per row e for fixed SNR= 4 with outliers (above) and the 
RMSE as a function of 1/SNR for fixed e = 40 with outliers (below). 



2) Multiplicative Gaussian noise: In sensor network local- 
ization lfT6l . where the entries of the matrix corresponds to 
the pair-wise distances between the sensors, the observation 
noise is oftentimes assumed to be multiplicative. In formulae, 
Zij ~ S,ijMij, where ^^'s are distributed as i.i.d. Gaussian 
with zero mean. The variance of ^^'s are chosen to be l/r so 
that the resulting noise power is one. Note that in this case, 

's are mutually dependent through A/^j's and the values of 
the noise also depend on the value of the matrix entry Af^ . 

Figure |3] shows the RMSE with respect to e under multi- 
plicative Gaussian noise. The RMSE of the rank-r projection 
for e = 40 is larger than 1.5 and is omitted in the figure. 
The bottommost line corresponds to the oracle performance 
under standard scenario, and is displayed here, and all of 
the following figures, to serve as a reference for comparison. 
The main difference with respect to Figure [T] is that all the 
performance curves are larger under multiplicative noise. For 
the same value of SNR, it is more difficult to distinguish 
the noise from the original matrix, since the noise is now 
correlated with the matrix A/. 

3) Outliers: In structure from motion |2|, the entries of 
the matrix corresponds to the position of points of interest in 
2-dimensional images captured by cameras in different angles 
and locations. However, due to failures in the feature extraction 
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Fig. 5. The RMSE as a function of the average number of observed 
entries per row e for fixed SNR= 4 with quantization. 



Fig. 6. The RMSE as a function of the average number of observed 
entries per row e for fixed SNR= 4 with ill-conditioned matrices. 



algorithm, some of the observed positions are corrupted by 
large noise where as most of the observations are noise free. 
To account for such outliers, we use the following model. 

{a with probability 1/200 , 
-a w.p. 1/200 , 
w.p. 99/100. 

The value of a is chosen according to the target SNR= 
400/a^. This is clearly independent of the matrix entries and 
Zjj's are mutually independent, but the distribution is now 
non-Gaussian. 

Figure |4] shows the performance of the algorithms with 
respect to e and the SNR with outliers. Comparing the first 
figure to Figure [T] we can see that the performance for 
large value of e is less affected by outliers compared to 
the small values of e. The second figure clearly shows how 
the performance degrades for non-Gaussian noise when the 
number of samples is small. The algorithms minimize the 
squared error WPsiX) - 'PEiN)\\% as in Q and ©. For 
outliers, a suitable algorithm would be to minimize the li- 
norm of the errors instead of the /2-norm. Hence, for this 
simulation outliers, we can see that the performance of the 
rank-r projection, ADMiRA and OptSpace is worse than 
the Gaussian noise case. However, the performance of FPCA 
is almost the same as in the standard scenario. 

4) Quantization noise: One common model for noise is 
the quantization noise. For a regular quantization, we choose 
a parameter a and quantize the matrix entries to the nearest 
value in {. . . , — a/2, a/2, 3a/2, 5a/2 . . .}. The parameter a 
is chosen carefully such that the resulting SNR is 4. The 
performance for this quantization is expected to be worse 
than the multiplicative noise case, since now the noise is 
deterministic and completely depends on the matrix entries 
Mij, whereas in the multiplicative noise model it was random. 

Figure |5] shows the performance against e within quantiza- 
tion noise. The overall behavior of the performance curves is 
similar to Figure [T] but all the curves are shifted up. Note that 
the bottommost line is the oracle performance in the standard 



scenario which is the same in all the figures. Compared to 
Figure[3] for the same values of SNR= 4, quantization is much 
more damaging than the multiplicative noise as expected. 

5) III conditioned matrices: In this simulation, we look 
at how the performance degrades under the standard sce- 
nario if the matrix AI is ill-conditioned. M is generated as 
M = y47T66L/diag([l,4,7, 10])F^, where U and V are 
generated as in the standard scenario. The resulting matrix has 
condition number 10 and the normalization constant ^/4/16Q 
is chosen such that E[||Af||i?] is the same as in the standard 
case. 

Figure |6] shows the performance with respect to e with 
ill-conditioned matrix M. The performance of OptSpace is 
similar to that of ADMiRA for many values of e. However, a 
modification of OptSpace called Incremental OptSpace 
achieves a better performance in this case of ill-conditioned 
matrix. The INCREMENTAL OptSpace algorithm starts from 
finding a rank-1 approximation from and incrementally 
finds higher rank approximations and has more robust perfor- 
mance when M is ill-conditioned, but is computationally more 
expensive. 

B. Numerical results with real data matrices 

In this section, we consider the low-rank matrix completion 
problems in the context of recommender systems, based on 
two real data sets : the Jester joke data set H?] and the 
Movielens data set IfTSl . The Jester joke data set contains 
4.1 X 10^ ratings for 100 jokes from 73,421 users. Since 
the number of users is large compared to the number of jokes, 
we randomly select n„ e {100,1000,2000,4000} users for 
comparison purposes. As in fTO^, we randomly choose two 
ratings for each user as a test set, and this test set, which 
we denote by T, is used in computing the prediction error 
in NormaUzed Mean Absolute EiTor (NMAE). The Mean 

'The dataset is available at http://www.ieor.berk:eIey.edu/~goldberg/jester- 
data/ 



Absolute Error (MAE) is defined as in ifTOl. |fT9ll. 



13 Mi j I , 



(i,j)6T 



where is the original rating in the data set and Mij is the 
predicted rating for user i and item j. The Normalized Mean 
Absolute Error (NMAE) is defined as 

MAE 



NMAE 



where M,„ax and Afmin are upper and lower bounds for 
the ratings. In the case of Jester joke, all the ratings are in 
[-10, 10] which impHes that Afmax = 10 and Mmin = -10. 
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The numerical results for Jester joke data set using IN- 
CREMENTAL OptSpace, FPCA and ADMiRA are presented 
in the first four columns of the table above. The number of 
jokes m is fixed at 100 and the number of users n„ and the 
number of samples is given in the first two columns. The 
resulting NMAE of each algorithm is shown in the table. To 
get an idea of how good the predictions are, consider the 
case where each missing entry is predicted with a random 
number drawn uniformly at random in [—10, 10] and the actual 
rating is also a random number with same distribution. After 
a simple computation, we can see that the resulting NMAE 
of the random prediction is 0.333. As another comparison, for 
the same data set with n„ — 18000, simple nearest neighbor 
algorithm and ElGENTASTE both yield NMAE of 0.187 \T9\. 
The NMAE of INCREMENTAL OptSpace is lower than these 
simple algorithms even for Uu = 100 and tends to decrease 
with n„. 

Looking at a complete matrix where all the entries are 
known can bring some insight into the structure of real data 
matrices. With Jester joke data set, we deleted all users 
containing missing entries, and generated a complete matrix 
M with 14, 116 users and 100 jokes. The distribution of the 
singular values of M is shown in Figure [T] We must point out 
that this rating matrix is not low-rank or even approximately 
low-rank, although it is common to make such assumptions. 
This is one of the difficulties in dealing with real data. The 
other aspect is that the samples are not drawn uniformly at 
random as commonly assumed in |]6], ||5|. 

Numerical simulation results on the Movielens data set 
is also shown in the last row of the above table. The data 
set contains 100,000 ratings for 1,682 movies from 942 
usersH We use 80, 000 randomly chosen ratings to estimate 
the 20,000 ratings in the test set, which is called ul.base 

^The dataset is available at http://www.grouplens.org/node/73 
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Fig. 7. Distribution of the singular values of the complete sub matrix 
in the Jester joke data set. 



and ul.test, respectively, in the movielens data set. In the last 
row of the above table, we compare the resulting NMAE using 
Incremental OptSpace , FPCA and ADMiRA. 
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